134 research outputs found

    Unsupervised Data Augmentation for Less-Resourced Languages with no Standardized Spelling

    Get PDF
    International audienceNon-standardized languages are a challenge to the construction of representative linguistic resources and to the development of efficient natural language processing tools: when spelling is not determined by a consensual norm, a multiplicity of alternative written forms can be encountered for a given word, inducing a large proportion of out-of-vocabulary words. To embrace this diversity, we propose a methodology based on crowdsourcing alternative spellings from which variation rules are automatically extracted. The rules are further used to match out-of-vocabulary words with one of their spelling variants. This virtuous process enables the unsupervised augmentation of multi-variant lexicons without requiring manual rule definition by experts. We apply this multilingual methodology on Al-satian, a French regional language and provide (i) an intrinsic evaluation of the correctness of the obtained variants pairs, (ii) an extrinsic evaluation on a downstream task: part-of-speech tagging. We show that in a low-resource scenario, collecting spelling variants for only 145 words can lead to (i) the generation of 876 additional variant pairs, (ii) a diminution of out-of-vocabulary words improving the tagging performance by 1 to 4%

    Éthique et TAL : ce dont on parle, ce dont on ne parle plus, ce dont on ne parle pas (un état de l'art)

    Get PDF
    National audienceDepuis quelques années, l’éthique est devenue un sujet reconnu dans les domaines de l’IA et plus particulièrement dans le traitement automatique deslangues (TAL). Cette évolution récente est due à plusieurs facteurs, dont le fait que le TAL est devenu suffisamment rentable commercialement pour sortir des laboratoires de recherche et envahir nos vies quotidiennes, avec des conséquences immédiatement visibles pour le grand public. Je reviendrai dans cette présentation sur l’évolution qu’a connu le sujet sur la dernière décennie, qui a vu certaines problématiques devenir évidentes (comme la rémunération des travailleurs du clic) et ne plus être discutées, alors que d’autres (notamment les biais des modèles de langues) occupent le devant de la scène, occultant les questions les plus difficiles. Une large place sera laissée à la discussion, afin de permettre des échanges de vues sur ces sujets

    Extending the adverbial coverage of a French wordnet

    Get PDF
    Proceedings of the NODALIDA 2009 workshop WordNets and other Lexical Semantic Resources — between Lexical Semantics, Lexicography, Terminology and Formal Ontologies. Editors: Bolette Sandford Pedersen, Anna Braasch, Sanni Nimb and Ruth Vatvedt Fjeld. NEALT Proceedings Series, Vol. 7 (2009), 33-37. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9209

    Extension et couplage de ressources syntaxiques et sémantiques sur les adverbes

    Get PDF
    International audienceThis paper presents a work on extending the adverbial entries of the WOLF, a semantic lexical resource for French, and connecting them with those of the syntactic lexicon Lefff , which were mostly extracted from the lexicon-grammar tables from (Molinier & Levrier, 2000). This work relies on the exploitation of the derivation and the synonyms relations; the latter are extracted from the DicoSyn synonyms database. The resulting semantic resource, which is freely available, is manually evaluated and validated in an exhaustive manner

    From the Ground Up: Developing a Practical Ethical Methodology for Integrating AI into Industry

    Get PDF
    International audienceIn this article we present a new approach to practical artificial intelligence (AI) ethics in heavy industry, which was developed in the context of an EU Horizons 2020 multi partner project. We begin with a review of the concept of Industry 4.0, discussing the limitations of the concept, and of iterative categorization of heavy industry generally, for a practical human centered ethical approach. We then proceed to an overview of actual and potential AI ethics approaches to heavy industry, suggesting that current approaches with their emphasis on broad high-level principles are not well suited to AI ethics for industry. From there we outline our own approach in two sections. The first suggests tailoring ethics to the time and space situation of the shop floor level worker from the ground up, including giving specific and evolving ethical recommendations. The second describes the ethicist's role as an ethical supervisor immersed in the development process and interpreting between industrial and technological (tech) development partners. In presenting our approach we draw heavily on our own experiences in applying the method in the Use Cases of our project, as examples of what can be done

    Les jeux ayant un but : des sciences participatives ?

    Get PDF
    National audienceLes jeux ayant un but sont des jeux qui cachent leur but réel, la production de données. Ils s'inscrivent de ce fait dans la production participative (crowdsourcing), qui inclut aussi bien le travail parcellisé que des plateformes bénévoles, comme Wikipédia. Dans quelle mesure ces jeux peuvent être considérés comme faisant partie des sciences participatives ? quelles sont leurs spécificités

    Éthique et traitement automatique des langues

    Get PDF
    National audienc

    Sciences participatives et diversité linguistique Retours d'expériences

    Get PDF
    National audienceCertaines langues pâtissent d’un manque de ressources au sens large, qu’elles soient humaines,linguistiques ou financières, en particulier pour produire les outils de traitement automatiquenécessaires à leur intégration numérique. Pour ces langues, dites « peu dotées », la productionparticipative apparaît comme un moyen prometteur de mettre à profit la présence croissante delocuteurs sur Internet

    Ethical Internal Logistics 4.0: Observations and Suggestions from a Working Internal Logistics Case

    Get PDF
    International audienceIn this paper we present our experiences and insights from a Use Case in heavy industry, where OCR text recognition is combined with algorithms to correctly identify labels for additives to be introduced into a production process. Ethical issues are presented relative to the effects of the Use Case upon the shop floor operators using the new technology. We then discuss recommendations given and our success in getting them implemented. An argument follows, regarding what we view as the source of many of the ethical issues: the unreflective acceptance of Industry 4.0 and Internal Logistics 4.0 as a generalized and idealized 'plan' which technological development and the human operator have to adapt to. We contrast this to an approach where the needs of the human in the work context would drive and limit internal logistics 4.0 development as a set of gradual improvements tailored to the worker's situation
    corecore